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Classification of cancer 
Field of invention 

The present invention relates to a method for classification of cancer in an 
5 individual, wherein the microsatellite status and a prognostic marl<er are determined 
by examining gene expression patterns. The invention also relates to various 
methods of treatment of cancer. Additionally, the present invention concerns a 
pharmaceutical composition for treatment of cancer and uses of the present 
Invention. The invention also relates to an assay for classification of cancer. 

10 

Background of invention 

Studies of differential gene expression in diseased and normal tissues have been 
greatly facilitated by the building of large databases of the human genome 
sequences. Gene expression alterations are Important factors in the progression 

15 from norma! tissue to diseased tissue. In order to obtain a profile of transcriptional 
status in a certain cell type or tissue, array-based screening of thousands of genes 
simultaneously is an invaluable tool. Array-based screening even allows for the 
identification of key genes that alone, or in combination with other genes, regulate 
the behaviour of a ceil or tissue. Candidate genes for future therapeutic intervention 

20 may thus also be Identified. 

Colorectal cancer generally occurs in 1 out of every 20 individuals at some point 
during their lifetime. In the United States alone about 150,000 new cases are 
diagnosed each year which amount to 15% of the total number of new cancer 
25 diagnoses. Unfortunately, colorectal cancer causes about 56,000 deaths a year in 
" the United States. 

The malignant transformation from no'nnal tissue to cancer is believed to be a 
multistep process. Two molecular pathways are known to be involved in the 

30 development of colorectal cancer (Lengauer C. Kinzler KW, Vogelstein B.. 1998) 
namely the microsatellite stable (MSB) pathway and the microsatellite instable (MSI) 
pathway. MSS is associated with high frequency of allelic losses, abnormalities of 
cytogenetic nature and abnomial tumor content of DNA. MSI however is associated 
with defects in the DNA mismatch repair system which leads to increased rate of 

35 point mutations and minor chromosomal insertions or deletions. 
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MSI tumors can be of hereditary or sporadic nature. Ninety percent of MSI tumours 
are of sporadic origin. Sporadic tumours are presumably MSI due to epigenetic 
hypemiethylation of the MLH1 gene promoter. The hereditary tumours account for 
10 % of the MSI tumors. Mutations of for example the MLH1 or MSH 2 genes are 
5 often the cause of hereditary tumor development. 

The ability of being able to determine the sporadic or hereditary nature of a MSI 
tumor is highly valuable. In case a tumor is characterized as being MSI , and certain 
clinical criteria are fulfilled such as age below 50 or three first degree relatives with 

10 colon cancer, a screening programme of family members for early diagnosis and 
treatment of potential colon or endometrial cancer development is initiated. The 
human and economic costs in relation to screening programmes are severe. 
Consequently, a need for identifying colon cancers with a hereditary character 
exists. Further, these patients have a poor prognosis, as they have an increased risk 

15 of metachronous colon tumors and a highly Increased risk of getting cancer in the 
endometrium (females), upper urinary tract and a number of other organs. Thus, 
one may regard the determination of a colon tumor as being sporadic or hereditary 
as determination of a prognostic factor. 

20 Tumors appearing to be similar - morphologically, histochemically or 
microscopically - can be profoundly different. They can have different invasive and 
metastasizing properties, as well as respond differently to therapy. There is thus a 
need in the art for methods which distinguish tumors and tissues on different bases 
than are currently In use in the clinic. Detemninatlon of mlcrosatellite status using an 

25 array-based methodology is -faster than conventional DNA based methods, as it 
does not require microdissection, and forms a set of genes that can be combined 
with other sets of genes on a colon cancer array that can be used to determine 
mlcrosatellite status as well as e.g. predict disease course by identifying hereditary 
cases or other prognostic important factors, and finally predict therapy response. 

30 

Summary of invention 

In one aspect the present invention relates to a method of classifying cancer In an 
^ individual having contracted cancer comprising 
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in a sample from the individual having contracted cancer detennlning the microsatel- 
lite status of the tumor and 



In a sample from the Individual having contracted cancer, said sample comprising a 
5 plurality of gene expression products the presence and/or amount which forms a 
pattem, determining from said pattern a prognostic marl<er, wherein the microsatel- 
lite status and the prognostic marl<er Is detemnlned simultaneously or sequentially 

classifying said cancer from the microsatellite status and the prognostic marker. 

10 

The cancer may be any cancer known to be microsatellite instable in at least a frac- 
tion of the cases, such as colon cancer, uterine cancer, ovary cancer, stomach can- 
cer, cancer in the small intestine, cancer in the biliary system, urinary tract cancer, 
brain cancer or skin cancer. These cancers are part of the spectrum of cancers that 
15 belong to the hereditary non-polyposis colon cancer syndrome, but the invention is 
not limited to this syndrome. 

Gene expression patterns may be fonned by only a few genes, but it Is also a pre- 
ferred embodiment that a multiplicity of genes form the expression pattern whereby 
20 information for classification of cancer can be .obtained. 

Furthermore, the invention relates to a method for classification of cancer in an indi- 
vidual having contracted cancer, wherein the microsatellite status is determined by a 
method comprising the steps of 

25 

in a sample from the individual having contracted cancer, said sample comprising a 
plurality of gene expression products the presence and/or amount of which forms a 
pattern that is indicative of the microsatellrte status of said cancer. 

30 determining the presence and/or amount of said gene expression products forming 
said pattern. 

obtaining an indication of the microsatellite status of said cancer in the individual 
based on the step above. 

35 
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Yet another aspect of the invention relates to a method for classification cancer in 
an individual having contracted cancer, wherein the hereditary or sporadic nature is 
determined by a method comprising the steps of 

5 in a sample from the individual having contracted cancer, said sample comprising a 
plurality of gene expression products the presence and/or amount of which forms a 
pattern that is indicative of the hereditary or sporadic nature of said cancer, 

detemnining the presence and/or amount of said gene expression products forming 
10 said pattern, 

obtaining an indication of the hereditary or sporadic nature of said cancer in the indi- 
vidual based on the step above. 

15 The present invention further concerns a method for treatment of an individual com- 
prising the steps of 

selecting an individual having contracted a colon cancer, wherein the microsatellite 
status is stable, determined according to any of the methods as defined herein 

20 

treating the individual with anti cancer drugs . 

Another aspect of the present invention relates to a method for treatment of an Indi- 
vidual comprising the steps of 

25 

selecting an individual having contracted a colon cancer, wtierein the microsatellite 
status is instable, determined according to any of the methods as defined herein 

treating the individual with anti cancer drugs. 

30 

Yet another aspect of the present invention relates to a method for reducing malig- 
nancy of a cell, said method comprising 
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contacting a tumor cell in question with at least one peptide expressed by at least 
one gene selected from genes being expressed at least two-fold higlier in tumor 
cells than the amount expressed in said tumor cell in question. 

5 Additionally, the present invention concems a method for reducing malignancy of a 
tumor cell in question comprising, 

obtaining at least one gene selected from genes being expressed at least two fold 
lower in tumor cells than the amount expressed in nomnal cells 

10 

introducing said at least one gene Into the tumor cell in question In a manner allow- 
. ing expression of said gene(s). 

The invention also relates to a method for reducing malignancy of a cell in question, 
15 said method comprising 

obtaining at least one nucleotide probe capable of hybridising with at least one gene 
of a tumor cell in question, said at least one gene being selected from genes being 
expressed In an amount at least two-fold higher in tumor cells than the amount 
20 expressed In normal cells, and 

introducing said at least one nucleotide probe into the tumor cell in question in a 
manner allowing the probe to hybridise to the at least one gene, thereby inhibiting 
expression of said at least one gene. 

25 

In a further aspect the invention relates to a method for producing antibodies against 
an expression product of a cell from a biological tissue, said method comprising the 
steps of 

30 obtaining expression product(s) from at least one gene said gene being expressed 
as defined herein 

immunising a mammal with said expression product(s) obtaining antibodies against 
the expression product. 
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The present invention also concerns a method for treatment of an individual 
comprising the steps of 

selecting an individual having contracted a colon cancer, wherein the microsatellite 
6 status is stable, detemnined according to any of the methods as defined herein 

introducing at least one gene into the tumor cell in a manner allowing expression of 
said gene(s). 

10 The present Invention further relates to a pharmaceutical composition for the 
treatment of a classified cancer comprising at least one antibody as defined herein. 

In yet another aspect the invention concerns a pharmaceutical composition for the 
treatment of a classified cancer comprising at least one polypeptide as defined 
15 herein. 

Further, the invention relates to a pharmaceutical composition for the treatment of a 
classified cancer comprising at least one nucleic acid and/or probe as defined 
herein. 

20 

In an additional aspect the present invention relates to an assay for classification of 
cancer in an individual having contracted cancer, comprising 

at least one marker capable of detenminlng the microsatellite status in a sample and 

25 

at least one marker in a sample determining the prognostic marker, wherein the 
microsatellite status and the prognostic mari<er is determined simultaneously or se- 
quentially. 

30 Detailed description of the drawings 
Figure 1 

Unsupervised hierarchical clustering of colorectal tumors based on the 1239 
genes with the highest variation across all tumors. 

The phylogenetic tree shows the spontaneous clustering of tumor samples and 
35 normal biopsies. Germline mutation indicates samples vwth hereditary mutations in 
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either MLH1 or MSH2 genes. In columns referring to results of immunohistochemis- 
try a plus Indicates a positive antibody staining. Tumor location indicates rigiit-sided 
or left-sided location in the colon of the tumor. 

5 Figure 2 

Summary of the performance of the microsatellite instability classifier based 
on microarray data. 

Panel A shows the number of classification en^ors as a function of the number of 
genes used. Panel B shows loga of the ratio of the distance between a tumor to the 

10 centers of the microsatellite instable group and the microsatellite stable tumors. A 
value of +2 indicates that the distance of a tumor to the microsatellite instable group 
Is 4 times the distance to the microsatellite stable group. Open bars are MSI tumors 
and solid bars are MSB tumors. Panel C shows the result of the permutation analy- 
sis for estimation of the stability of the classifier. This was estimated by generating 

15 one hundred new classifiers based on randomly chosen datasets from the 101 tu- 
mors each consisting of 30 microsatellite stable and 25 microsatellite instable sam- 
ples. In each case the classifier was tested with the remaining 46 samples. The per- 
formance for each set was evaluated and averaged over all 100 training and test 
sets. 

20 

l^igure 3 

Classification of MSI tumors as hereditary or sporadic cases based on two 
genes. 

Panel A shows the number of classification errors as a function of the number of 
25 genes used. In crossvalidation we found a minimum number of one error using two 
genes and adding more genes Increased the number of errors to a maximum num- 
ber of twelve. Both genes were used in at least 36 of the 37 crossvalidation loops. 
Panel B shows loga of the ratio of the distance between a tumor to the centers of the 
sporadic microsatellite instable group and the hereditary microsatellite Instable 
30 group. Panel C shows microan-ay signal values for MLH1 and PIWIL1 genes for all 
tumors. Asterisk indicates the misclasslfied tumor 



35 
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Figure 4 

Classification of microsatellite-instability status based on real-tin^e PCR. 
Panel A shows a cluster analysis of 18 of the 101 tumors samples and 9 genes 
based on the microarray data and compared to real-time PCR data from same sam- 

5 pies and genes. Dark colors indicate relative low expression and light/light grey color 
palette high expression. Panel B shows the result of 47 new independent samples 
based on PCR data from 7 of the 9 genes. Relative distances are explained in the 
legend to figure 2. The two misclassified tumors are indicated with an asterisk. For 
PCR primers and hybridization probes see supplement to methods. 

10 

Figure 5 

Kaplan-Meier estimates of crude survival among patient with Stage 11 and Stage III 
colorectal cancer according to microsatellite status of the tumor, determined by gene 
expression. Open triangles indicate censored samples. The patients left. at risk are 
15 denoted In brackets. The P values were calculated with use of the log-rank test. 

Figure 6 

Phylogenetic tree resulting from unsupervised hierarchical clustering. Clusteranaly- 
sis of colon specimens with associated clinicopathological features. 

20 

Figure 7 

Multidimentionai scaling plot showing distances between groups of tumors. 
Figure 8 

25 Performance of prediction of survival before and after separation in MSI-H and MSS 
Figure 9 

Perfonnance of the classifier for identification of hereditary disease. 
30 Figure 10 

Kaplan IVIeler estimates of overall survival among patients with Dukes' B and Dukes' 
C colon cancer according to microsatellite-instability status of the tumor, determined 
by gene expression. 
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Detailed description of the invention 
Classification of cancer 

The present inventors have, using large-scale array-based screenings, found a pool 
of genes, the expression products of which nnay be used to classify cancer in an 
5 individual. The presence of expression products and level of expression products 
provides an expression pattern which is correlated to a specific status and/or prog- 
nostic marker of the cancer. Characterization of the genes or functional analysis of 
the gene expression products as such is not required to classify the cancer based 
on the present method. Thus, the expression products of the plurality of genes can 
10 be used as markers for the classification of disease. 

One aspect of the present invention concerns a method for classifying cancer in an 
individual having contracted cancer by determining the microsatellite status and a 
prognostic marker in a sample. Determination of the microsatellite status and the 

15 prognostic marker may be performed simultaneously or sequentially. In one em- 
bodiment of the present invention the microsatellite status is determined. The prog- 
nostic marker is determined in a sample, wherein the presence and/or the amount of 
a number of gene expression products form a pattern wherefrom the prognostic 
marker Is determined. Based on the information gathered from the microsatellite 

20 status and the prognostic marker the cancer can be classified. In a prefen-ed em- 
bodiment the prognostic marker is the hereditary or sporadic nature of the cancer, 
jhe hereditary or sporadic nature of the cancer can be determined through a num- 
ber of steps comprising determining the presence and/or amount of gene expression 
products fomning a pattern in a sample. The sample comprises a number of gene 

25 expression products the presence and/or amount of which forms a pattern that is 
indicative of the hereditary or sporadic nature of the cancer. Hereby, an indication of 
the hereditary or sporadic nature of the cancer is obtained. 

In one embodiment of the invention the microsatellite status Is determined using 
30 conventional analysis of microsatellite status as described elsewhere herein. 

In another embodiment of the present invention the microsatellite status is deter- 
mined by gene expression patterns wherein the presence and/or the amount of the 
gene expression products form a pattern that is indicative of the microsatellite 
35 status. 
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Classification of cancer provides l<nowledge of the survival chances of an individual 
having contracted cancer. In case of cancer which according to the present Inven- 
tion has been classified as a hereditary cancer, screening programmes of family 
members to the individual having the classified cancer can be initiated. Such 
5 screening programmes can comprise conventional screening programmes employ- 
ing sequencing and other methods as described elsewhere. Thus, individuals at risk 
of developing cancer may be identified and action taken accordingly to detect devel- 
oping cancer at an early stage of the disease greatly improving the chances of suc- 
cessful intervention and thus survival rates. 

10 " ' 

Classification of cancer also provides insights on which sort of treatment should be 
offered to the individual having contracted cancer, thus providing an improved 
treatment response of the individual. Likewise, the Individual may be spared treat- 
ment that is inefficient in treating the particular class of cancer and thus spare the 

15 individual severe side effects associated with treatment that may even not be suit- 
able for the class of cancer. 

MIcrosatellite status 

The use of highly variable repetitive sequences found in microsatellite regions adja- 
20 cent to genes or other areas of Interest may be used as markers for linkage analy- 
sis, DNA fingerprinting, or other diagnostic application. 

Mtcrosatellltes are defined as loci (or regions within DNA sequences) where short 
sequences of DNA are repeated In tandem repeats. This means that the sequences 

25 are repeated one right after the other. The lengths of sequences used most often 
are di-, tri-, or tetra-nucleotides. At the same location within the genomic DNA the 
number of times the sequence (ex. AC) is repeated often varies between individuals, 
within populations, and/or between species. Due to the many repeats the microsa- 
tellites are prone to alter if there is a reduced repair of mismatches in the genome. In 

30 the present invention the traditional method of determining microsatellite status by 
employing microsatellite markers is replaced by determination of gene expression 
pattems. 

An Important factor in multi-step carcinogenesis is genomic instability. The devel- ' 
35 opment of some cancer fomris is known to follow two distinct molecular routes. One 
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route is the microsatellite stable. MSS. (and chromosomal instable pathway) which 
is often associated with a high frequency of allelic losses, cytogenetic abnormalities 
and abnormal DNA tunnor contents. The second route is the nnicrosatellite instable 
pathway MSI that is characterized by defects in the DNA mismatch repair system 
5 which leads to a high rate of point mutations and small chromosomal insertions and 
deletions. The small chromosomal insertions and deletions can be detected as 
mono and dinudeotide repeats (Boland CR, Thibodeau SN, Hamilton SR. et al-, 
Cancer Res 1998:58(22):5248-57). 

10 One aspect of the present invention relates to the classification of cancer In an indi- 
vidual having contracted cancer by detemnining the microsatellite status and a prog- 
nostic marker. One embodiment of the invention relates to microsatellite status de- 
termined by conventional methods employing microsatellite analysis as described 
above. Another embodiment of the invention relates to establishing the microsatellite 

15 status by determining the presence and/or amount of gene expression products of a 
sample which comprises a plurality of gene expression products forming a pattern 
which is indicative of the microsatellite status. 

The expression products of genes according to the present invention are not neces- 
20 sarily identical to the genes that are analysed by microsatellite markers in conven- 
tional methods of detenmlning microsatellite status. The pattern of the gene expres- 
sion products according to the present invention however correlates with Information 
on microsatellite status that can be obtained using ti-aditional methods. 

25' The detemiination of the microsatellite status and the prognostic marker of the can- 
cer may be performed sequentially. However, the determinations may also be per- 
formed simultaneously. 

Prognostic marker 

30 Together with knowledge of the microsatellite status In a sample of an individual 
having contracted cancer a prognostic marker is employed for classifying the 
cancer. The prognostic marker may be any marker that provides knowledge of the 
cancer type when combined with knowledge of microsatellite status. Consequently 
the prognostic marker may provide additional information on the cancer type when 

35 the microsatellite status is stable and similarly when the microsatellite status is 
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instable. In a preferred embodiment of the present invention the prognostic marker 
is the hereditary or sporadic nature of a cancer given that the microsatellite status is 
instable. The prognostic marker may in another embodiment be a prognostic marker 
for any feature or trait that provides further possibilities of classifying cancer. 
5 The prognostic marker is determined In a sample comprising a number of gene ex- 
pression products wherein the presence and/or amounts of gene expression prod- 
ucts form a pattern that is Indicative of the prognostic marker. 

Hereditary and sporadic nature of cancer 

10 Hereditary nonpolyposis colon cancer (HNPCC) is a hereditary cancer syndrome 
which carries a very high risk of colon cancer and an above-normal risk of other 
cancers (uterus, ovary, stomach, small intestine, biliary system, urinary tract, brain, 
and skin). The HNPCC syndrome is due to mutation in a gene in the DNA mismatch 
repair system, usually the MLH1 or MSH2 gene or less often the MSH6 or PMS2 

15 " genes. Families with HNPCC account for about 5% of all cases of colon cancer and 
typically have the following features (called the Amsterdam clinical criteria): 

Three or more first relative family members with colorectal cancer; affected family 
members in two or more generations; and at least one person with colon cancer 
20 diagnosed before the age of 50. 

The highest risk with HNPCC is for colon cancer. A person with HNPCC has about 
an 80% lifetime risk of colon cancer. Two-thirds of these tumors occur in the proxi- 
mal colon. Women with HNPCC have a 20-60% lifetime risk of endometrial cancer. 

25 In HNPCC, the gastric cancer is usually intestinal-type adenocarcinoma. The ovar- 
ian cancer in HNPCC may be diagnosed before age 40, Other HNPCC-related can- 
cers have characteristic features: the urinary tract cancers are transitional carci- 
noma of the ureter and renal pelvis; the small bowel cancer is most common In the 
duodenum and jejunum; and the most common type of brain tumor is glioblastoma. 

30 The diagnosis of HNPCC may be made on the basis of the Amsterdam clinical crite- 
ria (listed above) or on the basis of molecular genetic testing for mutations in a mis- 
match repair gene (MLH1, MSH2, MSH6 or PMS2). Mutations in MLH1 and MSH2 
account for 90% of HNPCC. Mutations in MSH6 and PMS2 account for the rest. 
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HNPCC is Inherited in an autosomal donninant manner. Each child of an individual 
with HNPCC has a 50% chance of Inheriting the mutation. Most people diagnosed 
with HNPCC have inherited the condition from a parent. However, not all individuals 
with an HNPCC gene mutation have a parent who had cancer. Prenatal diagnosis 
5 for pregnancies at increased risk for HNPCC is possible. 

In tumors that are microsatellite instable it is often found that the DNA mismatch 
repair proteins that are encoded by the MLH1 or MSH2 genes are inactivated. In 
case of microsatellite instable hereditary non-polyposis colorectal cancers germline 
10 mutation in MLH1 and MSH2 and somatic loss of function of the normal allele has 
been found to be associated with the disease. 

For most sporadic MSI tumors epigehetic hypermethylation. of the MLH1 promoter 
can be found to be associated with the cancer (Cunningham JM, Christensen ER, 
15 Tester DJ. et al., Cancer Res 1998;58(15):3455-60.. Kane MF, Loda M, Gaida GM, 
et al.. Cancer Res 1997;57(5):808-11., Herman JG, Umar A, Polyak K, et al., Proc 
Natl Acad Sci U S A 1998;95(12):6870-5., Kuismanen SA. Holmberg MT, Salovaara 
R, de la Chapelle A, Peltomaki P., Am J Pathol 2000;1 56(5): 1773-9). 

20 Forms of cancer 

Cancer leads to a change in the expression of one or more genes. The methods 
according to the invention may be used for classifying cancer according to the mi- 
crosatellite status and/or the hereditary or sporadic nature of the cancer. Thus, the 
cancer may be any malignant condition in which genomic instability is involved in the 
25 development of cancer, such as cancers related to hereditary non-polyposis coiorec- 
tal cancer, such as endometrial cancer, gastric cancer, small bowel cancer, ovarian 
cancer, kidney cancer, pelvic renal cancer or tumors of the nervous system, such as 
glioblastoma. 

30 One particular form of cancer according to the present invention is that of the co- 
lon/rectum. 

The cancer may be of any tumor type, such as an adenocarcinoma, a carcinoma, a 
teratoma, a sarcoma, and/or a lymphoma. 



35 
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In relation to the gastro-intestinal tract, the biological condition may also be colitis 
ulcerosa, Mb. Crohn, diverticulitis, adenomas. 

Colorectal tumors 

5 The data presented herein relates to colorectal tumors and therefore the description 
has focused on the gene expression level as one manner of identifying genes in- 
volved In the prediction of survival in cancer tissue. The malignant progression of 
cancer of colon or rectum may be described using Dukes stages where normal mu- 
cosa may progress to Dukes A superficial tumors to Dukes B, slightly invasive tu- 
'10 mors, to Dukes C that have spread to lymphnodes and finally to Dukes D that have 
metastasized to other organs. 

The grade of a tumor can also be expressed- on a scale of l-IV. The grade reflects 
the cytological appearance of the cells. Grade I cells are almost normal, whereas 
15 grade II cells deviate slightly from normal. Grade III appear clearly abnormal, 
whereas grade IV cells are highly abnormal. 

The phrase colon cancer is in this application meant to be equivalent to the phrase 
colorectal cancer. Colon cancers may be located in the right side of the colon, the 
20 left side of the colon, the transverse part of the colon and/or in the rectum. 

Samples 

The samples according to the present invention may be any cancer tissue. 

The sample may be in a form suitable to allow analysis by the skilled artisan, such 

25 as a biopsy of the tissue, or a superficial sample scraped from the tissue. In one 
embodiment of the invention it Is preferred that the sample is from a resected colon 
cancer tumor. Ini another embodiment the sample may be prepared by fonning a 
suspension of cells made from the tissue. The sample may, however, also be an 
extract obtained from the tissue or obtained from a cell suspension made from the 

30 tissue. The sample may be fresh or frozen, or treated with chemicals. 

Expression pattern 

Expression of one gene or more genes in a sample forms a pattern that is character- 
istic of the state of the cell. In a sample from an individual having contracted cancer 
35 a plurality of gene expression products are present. By expression pattern is meant 
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the presence of a combination of a number of expression products and/or the 
amount of expression products specific for a given biological condition, such as can- 
cer. The pattern is produced by determining the expression products of selected 
genes that together reveals a pattem that is indicative of the biological condition. 

5 Thus, a selection of the genes that carry information about a specific condition is 
developed. Selection of the genes Is achieved by analyzing large numbers of genes 
and their expression products to find the genes that will enable the desired differen- 
tiation between various conditions, such as microsateltite status (MSS or MSI) 
and/or prognostic marker, such as for example the sporadic or hereditary nature of a 

1 0 given cancer sample. The criteria for selection of the best genes for the pattem to be 
indicative of given biological conditions include confidence levels i.e. how accurate 
are the selected genes forming an expression pattern in giving correct information of 
. the biological condition. Thus, in one aspect of the present invention a specific pat- 
tern of gene expression profiles can be used to detemnine the microsatellite status in 

15 the sample. In a second aspect of the present invention the microsatellite status is 
determined and a specific pattern of the presence of a plurality of gene expression 
products and/or amount wherefrom a prognostic marker is determined. 

Determination of the microsatellite status employing gene expression patterns 
20 One aspect of the invention specifically relates to a method for detemriining the 

microsatellite status in a sample of an individual having contracted cancer based on 
determination of the expression pattern of at least two genes, such as at least three 
genes, such as at least four genes, such as at least 5 genes, such as at least 6 
genes, such as at least 7 genes; such as at least 8 genes, such as at least 9 genes, 

25 such as at least 10 genes, such as at least 15 genes, such as at least 20 genes, 
such as at least 30 genes, such as at least 40 genes, such as at least 50 genes, 
such as at least 60 genes, such as at least 70 genes, such as at least 80 genes, 
such as at least 90 genes, such as at least 126 genes selected from the group of 
genes listed in Table 1 below 

30 Table 1 

Gene name Ref seq Gene symbol 

chemokine (C-C motiO ligand 5 _NM.. 002985 CCL5 

tryptophanyl-tRNA synthetase NM 004184 WARS 

proteasome (prosome, macropain) activator NM_006263 PSME1 
subunit 1 (PA28 alpha) 

bone man-ow stromal cell antigen 2 NM 0Q4335 BST2 

ubiquitln-conjugating enzyme E2L 6 NM_004223 UBE2L6 



SEQ ID 

NO.: 
1 
2 
3 

4 
5 
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A kinase (PRKA) anchor protein 1 NM_003488 AKAP1 6 

proteasome (prosome, macropain) activator NM 002818 PSME2 7. 
subunit2(PA28 beta) 

carclnoembryonic antigen-related cell adhesion NM 004363 CEACAM5 8 
molecule 5 

PERM, RhoGEF (ARHGEF) and pleckstrin do- NM 005766 FARP1 9 
nnain protein 1 (chondrocyte-derived) 

myosin X NM 012334 MYO10 10 

heterogeneous nuclear ribonucleoprotein L NM 001533 HNRPL H 

autocrine motility factor receptor NM_001144 AMFR 12 

dimethylarglnine dimethylamlnohydrolase 2 NM 013974 DDAH2 13 

tumor necrosis factor, alpha-induced protein 2 NM 006291 TNFAIP2 14 

mulL homolog 1, colon cancer, nonpolyposis NM 000249 MLH1 15 
type 2 (E. coll) 

thymldylate synthetase NM 001071 TYMS 16. 

intercellular adhesion molecule 1 (CD54). human NM 000201 ICAM1 17 
rhinovlrus receptor 

genera! transcription factor IIA, 2, 1 2kDa NM 004492 GTF2A2 1 8 

Rho-associated, coiled-coil containing protein NM 004850 ROCK2 19 
kinase 2 

ATP binding protein associated with cell differen- NM 005783 TXNDC9 20 
tiation 

NCK adaptor protein 2 NM 003581 NCK2 21 

phytanoyl-CoA hydroxylase (Refsum disease) NM 006214 PHYH 22 

metastais-associated gene family, member 2 NM 004739 MTA2 23 

amiloride binding protein 1 (amine oxidase (cop- NM 001091 ABP1 24 

per-containing)) 

biliverdin reductase A NM 000712 BLVRA 25 

phospholipase C, beta 4 NM_000933 PLCB4 26 

chemoklne (C-X-C motif) ligand 9 NM 002416 CXCL9 27 

purine-rich element binding protein A NM 005859 PURA 28 

quinolinate phosphoribosyltransf erase (nlcoti- NM 014298 QPRT 29 
nate-nucleotide pyrophosphorylase (carboxylat- 
ing)) 

retinoic acid receptor responder (tazarotene NM 004585 RARRES3 30 

induced) 3 

chemoklne (C-C motif) ligand 4 NM 002984 CCL4 31 

forkhead box 03A NM 001455 F0X03A 32 

interferon, alpha-inducible protein (clone IFI-6- NM_002038 G1P3 34 
16) NM_022873 123 

chemoklne (G-X-C motif) ligand 10 NM 001565 CXCL10 35 

NM_005950 MT1G 36 
metallothionein 1G NM_005950 

NM_000043 TNFRSF6 37 
tumor necrosis factor receptor superfamily, NM_1 52877 1 33 

member 6 NM_1 52876 132 

NM_1 52875 134 
NM_1 52872 130 
NM_1 52873 33 
NM_152871 129 
NM_1 52874 131 

endothelial cell growth factor 1 (platelet-derived) NM 001953 ECGF1 38 

SCO cytochrome oxidase deficient homolog 2 NM 005138 SC02 39 
(yeast) 

chemoklne (C-X-C motif) ligand 13 (B-cell NM 006419 CXCL13 40 
chemoattractant) 
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Granuiysin 


NM_006433 


ONLY 


41 


CD2 antigen (p50), sheep red blood celt receptor 


NM 001767 


CD2 


42 


splicing factor, arglnine/serine-rich 6 


NM 006275 


SFRS6 


43 


teratocarcinoma-derived growth factor 1 


NM 003212 


TDGF1 


44 


rnetallothionein 1H 


KIM nn^Qi^'1 


MT1 H 


45 


cytochrome P450, family 2, subfamily B, poly- 


IMWi uOU7b7 






peptide 6 








tumor necrosis factor (ligand) superfamlly, mem- 


INIVI UUOO 1 1 


1 iNPory 


47 


ber 9 










NM 006047 


RBM12 

1 xL.^ IVI 1 4^ 


48 


RNA binding motif protein 12 


NM_006047 




49 




NM 006644 


HSPH1 


staufen, RNA binding protein (Drosophila) 


NM 004602 


STAU 


50 




NM 017452 








NM_0 17453 




126 


lymphocyte antigen 6 complex, locus G6D 


NM 021246 


LY6G6D 


51 


calcium binding protein P22 


NM 007236 


CHP 


52 


CDC14 cell division cycle 14 homolog B (S. cer- 


NM 003671 


CDC14B 


53 


evisiae) 


NM_033331 






epiplakin 1 


XM_372063 


EPPK1 


54 


metallothionein 1X 


NM 005952 


MT1X 


55 


transforming growth factor, beta receptor II 


NM 003242 


TGFBR2 


56 


(70/80kDa) 






57 


protein kinase C binding protein 1 


NM 012408 


PRKCBP1 


NM_1 83047 






froncmomhrnnfi A. ci inprfamilv mfsrnhpr fi 


NM 003270 


TM4SF6 


58 


pleckstrin homology domain containing, family B 


NM 021200 


PLEKHB1 


59 


(evectins) member 1 






60 


onnlinnnrntpln 1 1 


NM 003661 


APOL1 




NM_1 45343 






indoleamine-pynrole 2,3 dioxygenase 


NM 002164 


INDO 


61 


forkhead box A2 


NM_021784 


FOXA2 


62 


granzyme H (cathepsin G-IIke 2, protein h- 


NM 033423 


GZMH 


63 


CCPX) 








baculovlral lAP repeat-containing 3 


NM_001165 


BIRC3 


64 



125 



115 



124 



120 



Homo sapiens metallothionein 1 H-iike protein AF333388 135 

(Hs 382039) 

KIAA01 82 protein NM 014615 KIAA0182 117 

G protein-coupled receptor 56 NM 005682 GPR56 65 
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metallothionein 2A 
F-box only protein 21 

erythrocyte membrane protein band 4.1 -like 1 
hypothetical protein MGC21415 
protein O-fucosyltransferase 1 

metallothionein 1 E (functional) 
troponin T1, skeletal, slow 



chimerin (chlmaerin) 2 

heterogeneous nuclear ribonucleoprotein HI (H) 
ATP synthase, H+ transporting, mitochondrial F1 
complex, alpha subunit, isofonn 1, cardiac mus- 
cle 

eukaryotic translation initiation factor 5 A 
perforin 1 (pore forming protein) 
0GT(O-Glc-NAc transferase)-interacting protein 
106 KDa 

DEAD (Asp-GIu-Ala-Asp) box polypeptide 27 
vacuolar protein sorting 35 (yeast) 
tripartite motif-containing 44 

transmembrane, prostate androgen induced 
RNA 

dynein, cytoplasmic, light polypeptide 2A 



leucine ami nopeptidase 3 
chromosome 20 open reading frame 35 



solute carrier family 38, member 1 
CGI-85 protein 

death associated transcription factor 1 



hepatocellular carcinoma-associated antigen 
112 

sestrin 1 

hypothetical protein FLJ20315 

hypothetical protein FLJ20647 

membrane protein expressed in epithelial-like 

lung adenocarcinoma 

DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 
keratin 23 (histone deacetylase inducible) 



NM 201524 




116 


NM 005953 


MT2A 


66 


NM_015002 


FBX021 


67 


NM 012156, 


EPB41L1 


68 


NM_012156 






NM 173834 


MGC21416 


69 


NM 015352, 


POFUT1 


70 


Nm1015352 






NM 175617 


MT1E 


71 


IM IVI UUO^OO 


TNNT1 


72 


NM 004067 


CHN2 


73 


IN IVI \J\JkJ^^W 


HNRPH1 


74 


NM 004046 


ATP5A1 


75 


NM 001970 


EIF5A 


76 


NM 005041 


PRF1 


77 


NM 014965 


OIP106 


78 


NM 017895 


DDX27 


79 


NM 018206 


VPS35 


80 


NM 017583 


TRIM44 


81 


NM 020182 


Til n A 1 

TMcPAl 


82 


NM 199169 




127 


NM_199170 




128 


Nlvl_U141oo 


r\M/^l OA 


83 


NM_1 77953 




122 


NM 015907 




84 


NM_018478 


C20orf35 


85 






118 




SLC38A1 


86 


NM_016028 


CGI-85 


87 


NM_022105, 


DATF1 


88 


NM 080796 




121 


NM 018487 


HCA112 


89 


NM 014454 


SESN1 


90 


NM 017763 


FLJ2031 5 


91 


NM 017918 


FU20647 


92 


NM 024792 


CT120 


93 


NM 014314 


RIG-I 


94 


NM_015515, 


KRT23 


95 



UDP-N-acetyl-alpha-D- NM 007210 GALNT6 96 

galactosaminerpolypeptide N- 
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acetylgalactosaminyltransferase 6 (GalNAc-T6) 

aryl hydrocarbon receptor nuclear translocator- NM 020183 ARNTL2 
like 2 

apobec-1 complementation factor NM_014576, ACF 



hypotlietical protein FLJ20232 
apoiipoprotein L, 2 



mitochondrial solute carrier protein 
hypothetical protein- FLJ2061 8 



SET translocation 
associated) 

ATPase, class II. type 9a 



NM_1 38932 
NM Q19008 FLJ20232 
NM_030882. APOL2 
NM 145343 



NM_016612 MSCP 

NM 017903 FU20618 
NM 003011. 

(myeloid leukaemia- 1 SET 

Xm 030577. 
9 ATP9a 



97 

98 

119 
99 
100 
120 

101 
102 
103 



104 



One embodiment of the invention concerning the detemnination of microsatellite 
status is based on the expression pattem of at least 2 genes, such as at least 3 
genes, such as at least 4 genes, such as at least 5 genes, such as at least 6 genes, 
such as at least 7 genes, such as at least 8 genes, such as at least 9 genes, such 
as at least 10 genes, such as at least 15 genes, such as at least 20 genes, such as 
at least 25 genes selected from the group of genes listed in Table 2. 



Table 2 



10 



Gene name 

chemokine (C-C motif) ligand 5 

tryptophanyl-tRNA synthetase 

proteasome (prosome, macropaln) activator 

subunit 1 (PA28 alpha) 

bone marrow stromal cell antigen 2 

ubiquitin-conjugattng enzyme E2L 6 

A kinase (PRKA) anchor protein 1 

proteasome (prosome, macropaln) activator 

subunit 2 (PA28 beta) 

carcinoembryonic antigen-related cell adhesion 

molecule 5 

PERM, RhoGEF (ARHGEF) and pieckstrin do- 
main protein 1 (chondrocyte-derived) 
myosin X 

heterogeneous nuclear rlbonucleoprotein L 
autocrine motility factor receptor 
dimethylarginine dim ethyl am inohydrolase 2 
tumor necrosis factor, alpha-induced protein 2 
mutL homolog 1, colon cancer, nonpolyposls 
type 2 (E. coll) 
thym idyl ate synthetase 



Ref seq 

NM 002985 
NM 004184 
NM_006263 

NM 004335 
NM_004223 
NM_003488 
NM 002818 

NM 004363 

NM 005766 

NM 012334 
NM 001533 
NM_001144 
NM 013974 
NM 006291 
NM 000249 



Gene sym- SEQ 
bo! NO.: 



CCL5 

WARS 

PSME1 

BST2 
UBE2L6 
AKAP1 
PSME2 



FARP1 

MYO10 

HNRPL 

AMFR 

DDAH2 

TNFAIP2 

MLH1 



CEACAM5 8 



NM 001071 TYMS 



10 
11 
12 
13 
14 
15 

16 



ID 
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intercellular adhesion molecule 1 (CD54). hu- NM 000201 ICAM1 17 
man rhinovirus receptor 

general transcription factor IIA, 2, 12kDa NM 004492 GTF2A2 18 

Rho-associated, colled-coil containing protein NM 004850 ROCK2 19 

kinase 2 

ATP binding protein associated with cell differ- NM 005783 APACD 20 

entiation 

metastais-associated gene family, member 2 NM 004739 MTA2 23 

ctiemokine (C-X-C motif) ligand 10 NM 001565 CXCL10 35 

splidng factor, arginine/serine-rlch 6 NM 006275 SFRS6 43 

protein kinase C binding protein 1 NM_012408 PRKCBP1 57 

NM_1 83047 124 

hepatocellular carcinoma-associated antigen NM 018487 HCA112 89 
112 

hypothetical protein FLJ2061 8 NM 017903 FU20618 102 

SET translocation (myeloid leukaemia- NM 003011.1 SET 103 
associated ) 

ATPase. class II. type 9a Xm 030577.9 ATP9a 104 



or from ' 
Table 3 



Refseq Gene sym- SEQ ID 

Gene name bol NO.: 

iieterogeneous nuclear ribonucleoprotein L NM 001533 HNRPL 11 

NCK adaptor protein 2 NM 003581 NCK2 21 

phytanoyl-CoA hydroxylase (Refsum disease) NM 006214 PHYH 22 • 

metastais-associated gene family, member 2 NM 004739 MTA2 23 

amiloride binding protein 1 (amine oxidase NM 001091 ABP1 24 
(copper-containing)) 

billverdin reductase A NM 000712 BLVRA 25 

phospholipase C, beta 4 NM_000933 PLCB4 26 

chemokine (C-X-C motif) ligand 9 NM 002416 CXCL9 27 

purine-rich element binding protein A NM 005859 PURA 28 

quinolinate phosphoribosyltransferase (nicoti- NM 014298 QPRT 29 
nate-nucleotide pyrophospiiorylase (carboxylat- 
ing)) 

retinoic acid receptor responder (tazarotene NM 004585 RARRES3 30 

induced) 3 

chemokine (C-C motif) ligand 4 NM 002984 CCL4 31 

forkhead box 03A NM 001455 F0X03A 32 

metatlothionein IX NM 005952 MT1X 55 

interferon, alpha-inducible protein (clone IFI-6- NM_002038 G1P3 34 

16) NM_022873 123 

chemokine (C-X-C motif) ligand 10 NM 001565 CXCL10 35 

NM_005950, MT1G 36 

metallothtonein 1G NM_005950 
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» NM_000043 TNFRSF6 37 

tumor necrosis factor receptor superfamily, NM_1 52877 1 33 

member 6 NM_1 52876 132 

NM_1 52875 134 
NM_1 52872 130 
NM_1 52873 33 
NM_1 52871 129 
NM_1 52874 131 

endothelial celi growtii factor 1 (platelet- NM QQ1953 ECGF1 38 
derived) 

SCO cytochrome oxidase deficient homolog 2 NiVI 005138 SC02 39 

(yeast) 

chemoklne (C~X-C motif) ligand 13 (B-ceil NIVI 006419 CXCL13 40 
chemoattractant) 

Nl\/I_006433 GNLY 41 

Granulysin 

splicing factor, arglnine/serine-rich 6 NM 006275 SFRS6 43 

NIVI_012408 PRKCBP1 57 
NM_1 83047 124 

protein kinase C binding protein 1 

hepatocelluiar carcinoma-associated antigen NM 018487 HCA112 89 
112 

hypothetical protein FLJ2061 8 NM 017903 FLJ20618 102 

SET translocation (myeloid leukaemia- NM 003011.1 103 

associated) SET 

ATPase. class II. type 9a Xm 030577.9 ATPQa 104 



or fronn 
Table 4 



Refseq Gene sym- SEQ ID 

Gene name bol NO.: 

heterogeneous nuclear ribonucleoprotein L NM 001533 HNRPL 1 1 

metastais-associated gene family, member 2 NM 004739 MTA2 23 

chemoklne (C-X-C motif) ligand 10 NM 001565 CXCL10 35 

CD2 antigen (p50), sheep red blood cell recep- NM 001767 CD2 42 
tor 

splicing factor, arginine/serine-rich 6 NM 006275 SFRS6 43 

teratocarcinoma-derived growth factor 1 NM 003212 TDGF1 44 

metallothionein 1 H NM 005951 MT1H 45 

cytochrome P450, family 2, subfamily B, poly- NM 000767 CYP2B6 46 
peptide 6 

tumor necrosis factor (ligand) superfamily, NM 003811 TNFSF9 47 
member 9 

NM_006047, RBM12 48 
RNA binding motif protein 12 NM_006047 

heat shock 105kDa/110kDa protein 1 NM 006644 HSPH1 49 

staufen. RNA binding protein (Drosophila) NM_004602 STAU 50 

NM_017452 125 

NM_017453 126 

lymphocyte antigen 6 complex, locus G6D NM 021246 LY6G6D 51 

calcium binding protein P22 NM 007236 CHP 52 
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cerevislae) 

epiplakin 1 
metallothionein 1X 

transforming growth factor, beta receptor II 
(70/80kDa) 

protein kinase C binding protein 1 



transmembrane 4 superfamily member 6 
pleckstrin homology domain containing, 
B (evectins) member 1 
apolipoprotein L, 1 



indoleamine-pyrrole 2,3 dioxygenase 
forkhead box A2 

hepatocellular carcinoma-associated antigen 
112 

mitochondrial solute carrier protein 



hypothetical protein FLJ20618 
SET translocation (myeloid 
associated) 

ATPase, class II, type 9a 



leukaemia- 



NM 


003671 


CDC14B 




nmI 


.033331 




115 




372063 


EPPK1 




NM 


005952 


MT1X 


55 


NM 


003242 


TGFBR2 


56 


NM 


012408 


PRKCBP1 


57 


NM_ 


J 83047 




129 


NM 


003270 


TM4SF6 


58 


NM 


021200 


PLEKHB1 


59 


NM 


003661 


APOL1 


60 


nm" 


145343 




125 


NM 


002164 


INDO 


61 


InIvI 


U<i 1 f OH- 


FOXA2 


62 


nm' 


"021784 






nm' 


018487 


HCA112 


89 


NM 


016612 


MSCP 


101 


NM. 


_016612 






NM 


017903 


FU20618 


102 


NM 


003011.1 


SET 


103 


Xm 


030577.9 


ATP9a 


104 



or from 
Table 5 

Gene name 

heterogeneous nuclear rlbonudeoprotein L 

metastais-associated gene family, member 2 
chemokine (C-X-C motif) ligand 10 

splicing factor, arginine/serine-rich 6 
protein kinase G binding protein 1 

granzyme H (cathepsin G-IIke 2, protein h- 
CGPX) 

bacuioviral lAP repeat-containing 3 
Homo sapiens metallothionein 1 H-like protein 
KIAA0182 protein 

G protein-coupled receptor 56 



Ref seq 


Gene 


SEQ 


symbol 


NO.: 


NM 001533 


HNRPL 


11 


NM 004739 


MTA2 


23 


NM 001565 


CXCL10 


35 


NM 006275 


SFRS6 


43 


NM 012408 


PRKCBP1 


57 


NM_1 83047 




124 


NM 033423 


GZMH 


63 


NM 001165 


BIRC3 


64 


NM„001165 








AF333388 


135 



NM 014615 

NM 005682 
NM 301524 



(Hs 382039) 

KIAA0182 117 

GPR56 65 
116 



ID 
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metallothionein 2A NM 005953 MT2A 66 

F-box only protein 21 NM_015002 FBX021 67 

erythrocyte membrane protein band 4.1 -like 1 NM_012156 EPB41L1 68 

hypothetical protein MGC21416 NM 173834 MGC21416 69 

protein 0-fucosy!transferase 1 NM_0 15352 POFUT1 70 

metallothionein 1E (functional) NM 175617 MT1E 71 

NM_003283 TNNT1 72 

troponin T1 , skeletal, slow 

chlmerin (chimaerin) 2 NM 004067 -CHN2 73 

heterogeneous nuclear ribonucleoproteln H1 NM 005520 HNRPH1 74 
(H) 

ATP synthase, H+ transporting, mitochondrial NM 004046 ATP5A1 75 

F1 complex, alpha subunit, isoform 1, cardiac 

muscle 

eukaryotic translation initiation factor 5A NM 001970 E1F5A 76 

perforin 1 (pore forming protein) NM 005041 PRF1 77 

OGT(0-Glc-NAc transferase)-lnteracting protein NM_Q14965 OIP106 78 
106 KDa 

DEAD (Asp-Glu-Ala-Asp) box polypeptide 27 NM 017895 DDX27 79 

hepatocellular carcinoma-associated antigen NM 018487 HCA112 89 
112 

hypothetical protein FU20232 NM 019008 FLJ20232 99 

NM_030882, APOL2 100 

apolipoprotein L, 2 NM_1 45343 1 20 

hypothetical protein FLJ20618 NM 017903 FU20618 102 

SET translocation (myeloid leukaemia- NM 003011.1 SET 103 
associated) 

ATPase. class II. type 9a Xm 030577.9 ATP9a 104 



orfronn 
Table 6 



Refseq Gene sym- SEQ ID 

Gene name bol NO.: 

heterogeneous nuclear ribonucleoprotein L NM 001533 HNRPL 1 1 

metastais-associated gene family, member 2 NM 004739 MTA2 23 

chemoklne (C-X-C motif) ligand 10 NM 001565 GXCL10 35 

metailothionein 1G NM_005950 MT1G 36 

splicing factor, arginine/serine-rich 6 NM 006275 SFRS6 43 

protein kinase C binding protein 1 NM_01 2408 PRKCBP1 57 

NM_1 83047 129 

vacuolar protein sorting 35 (yeast) NM 018206 VPS35 80 

tripartite motif-containing 44 NM 017583 TRIM44 81 
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NM_020182 TMEPAI 82 
NM_199169 127 

transmembrane, prostate -androgen induced NM_199170 ''28 

RNA 

dynein, cytoplasmic, light polypeptide 2A NM_014183 DNCL2A 83 

NM_1 77953 122 

leucine aminopeptidase 3 NM 015907 LAP3 84 

chromosome 20 open reading frame 35 NM_018478 C20orf35 85 

NM_033542 118 



solute carrier family 38, member 1 NM 030674 SLC38A1 86 

CGI-85 protein NM_016028 CGI-85 87 

death associated transcription factor 1 NM_022105, DATF1 -88 

NM_080796 121 

hepatocellular carcinoma-associated antigen NM 018487 HCA112 89 

112 

sestrini NM 014454 SESN1 90 

hypothetical protein FLJ203 15 NM 017763 FLJ20315 91 

hypothetical protein FLJ20647 NM 017918 FLJ20647 92 

membrane protein expressed in eplthelial-lii<e NM 024792 CT120 93 
lung adenocarcinoma 

DEAD/H (Asp-Giu-Ala-Asp/Hls) box polypeptide NM 014314 RIG-I 94 

i<eratin 23 (histone deacetylase inducible) NM_015515 KRT23 95 

UDP-N-acetyl-alpha-D- NM 007210 GALNT6 96 

galactosamine:polypeptide N- 
acetylgalactosamlnyitransferase 6 (Ga!NAc-T6) 

aryl hydrocarbon receptor nuclear transiocator- NM 020183 ARNTL2 97 
like 2 

apobec-1 complementation factor NM_0 14576 ACF 98 

NM_1 38932 119 

hypothetical protein FLJ2061 8 NM 017903 FLJ20618 102 

SET translocation (myeloid leukaemia- NM 003011.1 SET 103 
associated) 

ATPase. class II. type 9a Xm 030577.9 ATP9a 104 



Another embodiment of the invention conceming the determination of microsatellite 
status is based on the expression pattern of at least 2 genes, such as at least 3 
5 genes, such as at least 4 genes, such as at least 5 genes, such as at least 6 genes, 
such as at least 7 genes, such as at least 8 genes, such as at least 9 genes 
selected from the group of genes listed in Table 7 below. 
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RNA purification Colon specimens were obtained fresh from surgery and were 
immediately snap frozen in liquid nitrogen either as was. in OCD-compound or in a 
SDS/guadinium thiocyanate solution. Total RNA was isolated using RNAzol (WAK- 
Chemie Medical) or spin column technology (Sigma) foliowing the manufactures' 
5 instructions. 

Gene expression analysis These procedures were performed at described in detail 

elsewhere (Dyrskodt et al). Briefly, ten //g of total RNA was used as starting material 
for the target preparation as described. First and second strand cDNA synthesis was 

10 perfomned using the Superscript tl System (Invitrogen) according to the manufac- 
turers' instructions except using an oligo-dT primer containing a T7 RNA polymerase 
promoter site. Labelled aRNA was prepared using the BioAn-ay High Yield RNA 
Transcript Labelling Kit (Enzo) using Biotin labelled CTP and UTP (Enzo) in the re- 
action together with unlabeled NTP's. Unincorporated nucleotides were removed 

15 using RNeasy columns (Qiagen). Fifteen jjg of cRNA was fragmented, loading onto 
the Affymetrix HG_U133A probe an-ay cartridge and hybridized for 16h. The arrays 
were washed and stained in the Affymetrix Fluidics Station and scanned using a 
confocal laser-scanning microscope (Hewlett Packard GeneArray Scanner 
G2500A). The readings from the quantitative scanning were analyzed by the Affy- 

20 metrix Gene Expression Analysis Software (MAS 5.0) and normalized using RMA 
(robust multi array nomialisation, Irizarry et al. 2002) in the statistical application R. 
Redundant probesets (as defined form Unigene build 168) with high conrelation 
(>0-5) over all samples were removed, which reduced the dataset to approximately 
14.400 probesets. This dataset was used a source for all further calculations in this 

25 manuscript. 

Unsupervised agglomeratlve liierarchicai clustering 

For hierarchical cluster analysis 1239 genes with a variation across all samples 
greater than 0.5 were median-centred to a magnitude of 1. Samples and genes 
30 were then clustered using average linkage clustering with a modified Person conrela- 
tion as similarity metric (Eisen et al., PNAS 95: 14863-14868, 1998). The cluster 
dendrogram was visualized with TreeVlew (Eisen). 
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10 



15 



20 



25 



Group testing 

We make a statistical test where the p-value is evaluated through permutations. For 
each group and gene we calculate the average and the sunn of squared deviations 
from the average. We then sum these over the genes and the groups: 



sum of squared deviations is denoted Sz. As a test statistic we use S1/S2. A small 
value indicates that there Is a real reduction in the deviations when going from 2 to 4 
groups and thus the groups have a real significance. To judge if a value is signifi- 
cantly small we use permutations. For each of the four groups left when joining DK 
and SF we randomly allocate the members to a pseudo DK and pseudo SF in such 
a way that the number of members In each group are as in the original data. 

To get an understanding of this separation we performed a test to see If this is 
caused by few genes or If many genes are involved. For this test we calculated Si = 

Sgenes Si (gene) and similarly with S2 =Egenes S2(gene), For each gene j we used the 

test statistic Si(j)/S2G) (Table 3). 

Multidimentional scaling 

We carried but multidimentional scaling on median-centered and normalized data 
using CMD— scale in the statistical application R and visualized in a two-dimentional 
plot 

Microsatellite status classifier 

The readings from the quantitative scanning were analyzed by the Affymetrix Gene 
Expression Analysis Software (MAS 6.0) and normalized using RMA (robust multi 
array normalisation, Irlzany et al. 2002) in the statistical application R. Redundant 
probesets (as defined fonn Unigene build 168) with high correlation (>0.5) over all 
samples were removed, which reduced the dataset to approximately 14.400 probe- 
sets. 

The microsatellite instability status classifier was based on a dataset of 4.266 genes. 
These genes result from the removal of genes with a variance over all tumor sam- 



joinlng DK 
that we 



This 




end up with two groups. The 
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pies smaller than 0.2 and genes that separate Danish from Finnish samples with a t- 
value numerically greater than 2. We used a normal distribution with the mean de- 
pendent on the gene and the group (MSI, IVISS). For each gene, we calculated the 
variation between the groups and the variation within the groups to select genes 
5 with a high ratio between these. To classify a sample, we calculated the sum over 
the genes of the squared distance from the sample value to the group mean, stan- 
dardized by the variance and assigned the sample to the nearest group. The sample 
to be classified was excluded when calculating group means and variances. 

1 0 Estimation of classifier stability 

We validated the performance of the classifier by permutation. One hundred data- 
sets consisting of 30 MSS samples and 25 MSI samples were randomly chosen by 
permutation for training of the classifier with the remaining samples in each case 
being assign to a testset. Averages over the 100 data sets of the number of errors in 
15 the cross-validation of the training set and in the test set were used as a measure of 
the precision of the classifier. 

Real-time PGR (RT-PCR). The procedures were as described (Birkenkamp- 
Demtroder) except that we used short LNA (Locked Nucleic Acid) enhanced probes 

20 from a Human Probe Library (Exiqcn^^^). In short, cDNA was synthesized from single 
samples some of which were previously analyzed on GeneChips. Reverse transcrip- 
tion was performed using Superscript II RT (Invitrogen). Real-time PGR analysis 
was performed on selected genes using the primers (DNA Technology) and probes 
(Exiqon, DK) described in figure legend X. All samples were normalized to GAPDH 

25 as described previously (Birkenkamp-Demtroder et. al. Cancer Res.. 62: 4352-4363, 
2002). 

Rebuilding of Classifier based on Real-Time PGR 

The 79 tumors samples that were not analysed by real-time PGR were transformed 
into log ratios using one of the tumor samples as reference and used for training of 
30 the classifier. Then 23 samples of which 18 were also analyzed on arrays were 
equally transformed into log ratios using the same tumor sample as above as refer- 
ence and tested. The idea behind this translation is that we expect the normalized 
PGR values to be proportional to the normalized array values, and on a log scale 
this becomes an additive difference. The difference is gene specific and is therefore 
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estimated for each gene separately. The variation obtained from the microarray 
data, and used In the classifier, can be used directly on the PGR platform. 

Results 
5 Hierarchical clustering 

The clinical specimens used in this study were collected in two different countries 
from 14 different clinics in the period 1994 to 2001, The samples were selected to 
keep a balanced representation of microsatelllte Instable (MSI) and microsatellite 
stable (MSS) tumors from both the right- and left-sided colon. The MSI class was 

10 represented both by sporadic MSI and hereditary MSI (HNPCC) tumors. Only 
Dukes' B and Dukes' C tumor samples were included were selected (table 19). Be- 
fore any attempt to divide a diverse sample collection into distinct classes analyzed 
the data for systematic bias that may have been introduces during the experimental 
procedures. A fast and easy way to discover both true distinct classes as well as 

1 5 systematic biases in the data is to perform a hierarchical clustering. 

The phylogenetic tree resulting from hierarchical clustering on 1239 genes (Fig. 6) 
reveals that the main separating factor is microsatellite status. On the upper trunk 
we find two clusters represented mainly by normal biopsies (14/21) and MSS tumors 

20 (18/25). respectively. The lower trunk is divided into a MSI cluster (30/36) and a 
second MSS duster (MSS2-cluster) (34/37). A closer Inspection of the two MSS 
clusters unveil that one Is dominated by Danish samples (19/25) and one by Finnish 
samples (26/37 check). Also, it is worth to notice that the MSI cluster contains a vast 
majority of Finnish samples (32/36) and that the sporadic MSI samples are inter- 

25 spersed among the hereditary samples. The normal biopsies cluster tight together 
with a slight tendency to separation according to origin. Tree normal samples cluster 
within the MSI cluster indicating that resection of these samples may have been to 
close to the tumor lesion. 

Inspection of the gene cluster dendrogram shows that the two groups of MSS tu- 
30 mors are mainly separated by a large cluster of genes being upregulated in the Dan- 
ish samples (data not shown) indicating that a systematic difference between Dan- 
ish and Finnish samples. 
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Significance of observed groups 

Based on these observations, we performed a series of test to evaluate If the ob- 
served separation of tumors into MSS and I^SI as well as DK and SF are significant. 
For these tests the tumor samples were grouped into four virtual tumor-groups la- 
5 belled, i.e. Danish MSI (MSl-DK), Danish MSS (MSS-DK), Finnish MSI (MSI-SF) 
and Finnish MSS (MSS-SF). Based on 5082 genes with a variance above 0.2, we 
tested if all four groups are significant or If some of the groups can be joined. We 
considered the two possibilities of joining DK and SF, and of joining MSI and MSS 
and made a statistical test where the p-value is evaluated through permutations. In 
10 100 permutations of each group combination our test value S1/S2 is considerably 
smaller than in all permutation (Table 20) demonstrating a very clear separation 
between DK and SF and between MSI and MSS. 

Table 20 
15 Permutation test of groups 



Pseudo 


81/82 from data 


Smaller values in 


Minimum in 100 


group 




1 00 permutations 


permutations 


DK-SF 


0.9072795 


0 


0.962269 


l-S 


0.9166195 


0 


0.9583325 



Such a clear distinction between groups may rely on a few highly separating genes 
20 or a general difference in the gene expression profile including many genes. For 
both the DK-SF and MSI-MSS the effect are caused by many genes even at very 
criteria, i.e. low test statistic SiG)/S2(j) values (Table 21). 
Table 21 



Permutation test of genes 









Si(i)/S2(i) 




Pseudo group 




<0.6 


<0.7 


<0.8 


<0.9 


DK-SF 


number of genes 


36 


136 


522 


1785 




max in 100 permutations 


0 


0 


2 


225 


MSI-MSS 


number of genes 


17 


103 


399 


1507 




max In 1 00 permutations 


0 


1 


8 


250 



When a property Is present that influences a large proportion of the genes this may 
30 obscure separation of clinical relevant features in unsupervised clustering. To visual- 
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ize the effect of such properties, we calculated distances by multidimensional scal- 
ing between samples with and without of 816 genes separating DK from SF with a t- 
value numerically greater than 2 (Fig 7). We see an Improved separation of MSI and 
MSS with Danish and Finnish cases mixed. The MSI-DK samples are not corrv 
5 pletely separated as they are found both between the MSI-SF and the MSS sam- 
ples. (These plots are not entirely unsupervised since the groups have been used to 
remove gene). 

Construction of an MSI-MSS classifier 

10 For the construction of a classifier we used the expression profiles from 97 tumors 
for which no ambiguity had been Identified in relation to microsatellite status. The 
816 genes separating DK from SF were excluded, as these would be unreliable for 
MS classification. We built a maximum likelihood classifier in order to select a mini- 
mum of genes giving the largest possible separation of the two groups. We tested 

15 the performance of the classifier using 1-1000 genes and found that it was stable 
showing 3-6 en-ors when using 4 - 400 genes. Of these 106 genes were especially 



suited for discrimination of MSS from MSI (table 22). 

Table 22 



AFFYID 


SYMBOL 


LOCUS 
LINK 


OMIM 


REFSEQ 


GENENAME 


1405 i 
at 




6352 


187011 


m 002985 


chemokine (C-C motiO ligand 5 


200628_ 
s at 


WARS 


7453 


191050 


NM 004184 


trvptophanvMRNA synthetase 


200814 
at 


PSMEI 


5720 


600654 


NM 006263 


proteasome (prosome, macropain) activator subunit 
1 (PA2B alpha) 


201641 
at 


BST2 


684 


600534 


NM 004335 


bone marrow stromal cell antigen 2 


201649_ 
at 


UBE2L6 


9246 


603890 


NM 004223 


ublquttin-conlugating enzyme E2L 6 


201674_ 
s at 


AKAP1 


8165 


602449 


NM 00348B 


A kinase (PRKA) anchor pro^n 1 


2017S2_ 
s at 


PSME2 


5721 


602161 


NM 002818 


proteasome (prosome, macropain) activator subunit 
2 fPA28 beta) 


201884 
at 


CEACAMS 


1048 


114890 


NM 004363 


carclnoembryonic antigen-related cell adhesion 
molecule 5 


201910 
at 


FARP1 


10160 


602654 


NM 005766 


PERM, RhoGEF (ARHGEF) and pleckstrin domain 
protein 1 (t^ondrocyte-derived) 


201976 
s at 


MYO10 


4651 


601481 


NM 012334 


myosin X 


202072 
at 


HNRPL 


3191 


603083 


NM 001533 


heterogeneous nuclear ribonucleoprotein L 


202203_ 
s at 


AMFR 


267 


603243 


NM 001144 


autocrine moHlity factor receptor 


202262 
X at 


DDAH2 


23564 


604744 


NM 013974 


dimethylarqfnine dimefhylamlnohydrolase 2 


20251 0_ 
5 at 


TNFA1P2 


7127 


603300 


NM 006291 


lumor necrosis factor, aipha-induced protein 2 


202520 
s at 


MLH1 


4292 


120436 


NM 000249 


mutL homolog 1 , colon cancer, nonpolyposis type 2 
(E. coll) 


202589. 
at 


TYMS 




.1883^ 


fsJM 0qi071 


thymidylate synthetase 
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202637_ 
s at 


tCAM1 


3383 


147840 


NM 000201 


Intercellutar adhesion molecule 1 (CD54}, human 
rhinovirus receptor 


202678 
at 


GTF2A2 


2958 


600519 


NM 004492 


general transcription factor liA, 2. 12ltDa 


202762 
at 


ROCK2 


9475 


604002 


NM 004850 


Rho-assoclated, coiied-coil containing protein kinase 
2 


203008 
X at 


APACD 


10190 




NM 005783 


ATP binding protein associated witli cell differentia- 
tion 


203315 
at 


NCK2 


8440 


504930 


NM 003581 


NCK adaptor protein 2 


203335 
at 


PHYH 


5264 


602026 


NM 006214 


phytanoyi-CoA hydroxylase (Refsum disease) 


203444_ 
s at 


MTA2 


9219 


603947 


NM 004739 


metastais-assodated gene famtiy, member 2 


203559_ 
s at 


ABP1 


26 


104610 


NM 001091 


amiloride binding proton 1 (amine oxidase (copper- 
containing)) 


203773 
X at 


BLVRA 


644 


109750 


NM 000712 


biliverdin reductase A 


203896_ 
s at 


PLCB4 


5332 


600810 


NM 000933 


phosplioiipase C. beta 4 


203915 
at 


CXCL9 


4283 


601704 


NM 002415 


chemoldne (C-X-C motif) ligand 9 


204020 
at 


PURA 


5813 


600473 


NM 005859 


purine-rich eiem^t binding protein A 


at 


QPRT 


23475 


606248 


NM 014298 


quinoiinate pliosphoritiosyltransferase (nicotinate- 
nudeottde pyrophosphorvlase (carboxylating)) 


at 


RARRES3 


5920 


605092 


NM 004585 


retinotc add receptor responder (tazarotene in- 
duced) 3 


2041 03 
at 


CCL4 


6351 


182284 


NM 002984 


chemoltine (C-C motIO ligartd 4 


2041 31 _ 
s at 


F0X03A 


2309 


602681 


NM 001455 


forkhead box 03A 


204326 
X at 


MT1X 


4501 


156359 


NM 005952 


metallolhionein 1X 


204415 
at 


G1P3 


2537 


147572 


NM 002038. 
NM 022873 


interferon, alpha-indudble protein (clone IFI-6-16) 


204533_ 
at 






147310 


NM 001565 


chemol^ine (C-X-C motif) ligand 10 


204745 
X at 


MT1G 


4495 


156353 


NM 005950, 
NM 005950 


metailotliionein 1G 


204780 
s at 


TNFRSF6 


355 


134637 


NM 000043. 
NM 152877, 
NM_1 52876. 
NM 152875 
NMIi 52872! 
NM 152873, 
NM~1 52871* 


tumor necrosis factor receptor super^mily, memlier 
6 


s at 


ECGF1 


1890 


131222 


NM 001953 


endotliellal cell growth factor 1 (platelet-derived) 


at 


SC02 


9997 


604272 


NM 005138 


SCO cytoclirome oxidase deficient homolog 2 
(yeast) 


205242 
at 


CXCL13 


10563 


605149 


NM 00641,g> 


chemoldne (C-X-C motiO ligand 13 (B-cell chemoat- 
iractant) 


205495_ 
5 at 


GNLY 


10578 


188855 


NM 006433, 
NM 006433 


granulysin 


205831 
at 


CD2 


914 


186990 


NM 001767 


CD2 antigen (p50), sheep red blood cell receptor 


2061 08_ 
s at 


SFRS6 


6431 


601944 


NM 006275 


splicing factor, arginine/serine-rich 6 


206286_ 
5 at 


TDGF1 


6997 


187395 


NM 003212 


teratocarcinoma-derived growth factor 1 


206461_ 
X at 


MT1H 


4496 


156354 


NM 005951 


mefallothioneln 1H 


206754_ 

s at 


CYP2B6 


. 1555 


123930 


NM 000767 


cytochrome P450, family 2, subfamily B. polypepUde 

6 


206907 
at 


TNFSF9 


1 8744 


606182 


NM 003811 


tumor necrosis factor (ligand) suoerfamily, member 9 
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206918_s_ 
at 


RBM12 


10137 


607179 


NM_006047. 
NM 006047 




206976_s„ 
at 


HSPH1 


10808 




NM 006644 


heat shock 105kDa/110kDa protein 1 


207320 x_ 
at 


STAU 


5780 


601716 


NM 004602. 
NM 004602, 
NM 017452, 
NM 017453 


staufen. RNA binding protein (Dresophila) 


207457 s 
at 


LY6G6D 


58530 


606038 


NM 021246 


lymphiocyte antigen 6 complex, locus G6D 


207993_s 
at 


CHP 


11261 


606988 


NM 007236 


calcium binding protein P22 


208Q22_s_ 
at 


CDC14B 


8555 


603505 


NM 003671, 
NM_003671, 
NM Udoool 


CDC14 cell division cyde 14 liomolog B (S. cere- 


208156_x_ 
at 


EPPK1 


83481 






epiplakin 1 


208581 X 
at 


MT1X 


4501 


156359 


NM 005952 


metaliotiilonein 1X 


208944 at 


TGFBR2 


7048 


190182 


NM 003242 


transforming growtti factor, beta receptor 11 
(70/80kDa) 


209048_8_ 
at 




23613 




NM 012408. 
NM_012408, 


nmtoin IrinsQA Ct hinriinn nrnt^in 1 


209108 at 


TM4SF6 


7105 




NIVl UUo.£/U 


frorkcnrtomKranA A iriPrfpmtlv mpmhpr fi 
UanollicIIIUi di ou idJ 1 til y iiiciiiur:! 


209504 s 
at 


PLEKHB1 


58473 


607651 


NM 021200 


pleckstrln homology domain containing, family B 
(evectins) member 1 


209546_S_ 


APOL1 


8542 


603743 


NM 003661, 
NM_003561, 
NM 145343 


apotipoprotein L, 1 




'NDO 


3620 


147435 


NM 002154 


indoleamine-pyrrole 2,3 dioxygenase 


210103 S_ 
at 


FOXA2 


3170 


600288 


NM 021784, 
NM 021784 


forkhead box A2 


210321 at 


GZMH 


2999 


116831 


NM 033423 


granzyme H (catliepsin G-like 2, protein h-CCPX) 


210538 s 
at 


BIRC3 


330 


601721 


NM 001165. 
NM 001165 


baculoviral lAP repeat-containing 3 


211456_x_ 
at 












212057 at 


KIAA0182 


23199 




XM 050495 


KiAA0182 protein 


212070 at 


GPR56 


9289 


604110 


NM 005682 


G protein-KVUpted receptor 56 


212185_x_ 
at 


MT2A 


4502 


1 55360 


NM 005953 


metaiiOuiionein za 


212229 S 
at 


FBX021 


23014 




NM 015002. 
NM 015002 


F-box only protein 21 


212336 at 


EPB41L1 


2036 




NM_012156, 


€ryiiirocyi6 rri&rrturoiic ijilaciii uciiiij H"* i*"iiw9 i 


212341 at 


MGC21416 


286451 




NM 173834 


hypothetical protein MGC21416 


212349 at 


POFUT1 


23509 




NM_015352. 

INIVl U lOOZ)^ 




212859_x_ 
at 


MT1E 




1 56351 


I>IIV1 If OD 1 J 


motallnfhir^noin iP /fiinftfinnf^n 


213201 s 
at 


TNNT1 


7138 


191041 


NM 003283. 
NM 003283, 
XM 352926 


troponin T1, skeletal, slow 


213385 at 


CHN2 


1124 


602857 


NM 004067 


chtmerin (chlmaerin) 2 


213470 s 
at 


HNRPH1 


3187 


601035 


NM 005520 


heterogeneous nuclear ribonucleoprotein H1 (H) 


213738_S_ 
at 


ATP5A1 


498 


164360 


NM 004046 


ATP synthase, H+ transporting, mitochondrial F1 
complex, alpha subunit. Isoform 1, cardiac muscle 


213757 at 


EIF5A 


1984 


600187 


NM 001970 


eukaryotic translation initiation factor 5A 
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214617 at 


PRF1 


5551 


1 70280 


NM 005041 


perforin 1 (pore forminq protein) 


214924 s 
at 


OIP105 


22906 


608112 


NM 014965 


OGT(0-Glc-NAc transferaseHnteracting protein 106 
KDa 


215693 X 
at 


DDX27 


55661 




NM 01 7895 


DEAD (Asp-Glu-Ala-Asp) box polypeptide 27 


215780_s 
at 


Hs.382039 










216336 X 
at 


AL031602 










217727 X 
at 


VPS35 


55737 


606931 


NM 018206 


vacuolar protein sortinq 35 (yeast) 


217759 at 


TR1M44 


54765 




NM 017583 


tripartite motif-containing 44 


217875 s 
at 


TMEPAI 


56937 


606564 


NM 020182, 
NM_020182,' 
NM 199169, 
NM 199170* 


transmembrane, prostate androgen induced RNA 


217917 s 
at 


DNCL2A 


83658 


607167 


NM 014183, 
NM 014183, 
NM 177953 


dynein. cytoplasmic, ilght polypeptide 2A 


217933_s_ 
at 


_LAP3 


51056 


1 70250 


NM 015907 


leucine aminopeptidase 3 


218094 s 

at 


C20orf35 


55861 




NM_018478. 
NM 018478 


chromosome 20 open reading frame 35 


218237_s_ 
at 


SLC38A1 


81539 




MM mnR7/i 

IN(V1 UOUO/*f 


solute earner family 38, member 1 


21 8242 s 
at 


CGI-85 


51111 




NM 016028, 
NM 016028 


CGi-85 protein 


at 


DATF1 


11083 


604140 


NM_022105, 
INIvl U^1UO, 

NM 080796 


death associated transcription factor 1 


218345 at 


HCA112 


55365 




NM 018487 


hepatocellular carcfnoma-associated antigen 112 


218346 s 
at 


SESN1 


27244 


6061 03 


NM 014454 


sestrin 1 


218704 at 


FLJ20315 


54894 




NM 017763 


hypothetical protein FLJ20315 


218802 at 


FLJ20647 


55013 




NM 017918 


hypotheUcal protein FLJ20647 


218898 at 


CT120 


79850 




NM 024792 


membrane protein expressed in epithelial-like lung 
adenocarcinoma 


218943 s 
at 


RIG-1 


23586 




NM 014314 


DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 


218963 s 
at 


KRT23 


25984 


606194 


NM 015515, 
NM 015515 


keratin 23 (histone deacetylase inducible) 


219956 at 


GALNT6 


11226 


605148 


NM 007210 


U D P-N-acety l-al pha-D-galactosamin e'polypeptide N- 
acetylgalactosamlnyltransferase 6 (GaiNAc-T6) 


220658_s_ 
at 


ARNTL2 


56938 




NM 020183 


aryl hydrocarbon receptor nuclear translocator-IIke 2 


220951 s 
at 


ACF 


29974 




NM 014576, 
NM 014576. 
NM 138932 


apobec-1 complementation factor 


221516 s 
at 


FLJ20232 


54471 




NM 019008 


hypothetical protein FLJ20232 


221653 X 
at 


AP0L2 


23780 


607252 


NM 030882, 
NM 030882 


apolipoprotein L. 2 


221920 s 
at 


MSCP 


51312 




NM 016612, 
NM 016612 


mitochondrial solute carrier protein 


222244 s 
at 


FLJ20618 


55000 




NM 017903 hypothetical protein FU20618 
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The minimum of three errors was found even using only 7 genes (Table 23). 
Table 23. Genes used for the classification of I\/1SS vs MSI tumors 



Name 


Symbol 


Unigene 


MSS 


MSI 


hepatocellular carcinoma-associated antigen 112 


HCA112 


Hs.12126 


1261 


653 


metastasis-associated 1-lil<e 1 


MTA1L1 


Hs.1 73043 


45 


91 


chemokine (C-X-C motif) ligand 10 


CXCL10 


Hs.2248 


• 104 


274 


heterogeneous nuclear ribonucleoprotein L . 


HNRPL 


Hs.2730 


194 


630 


hypothetical protein FLJ20618 


FLJ20618 


Hs.52184 


776 


388 


splicing factor, arginine/serine-rich 6 


SFRS6 


Hs.6891 


74 


446 


protein kinase C binding protein 1 


PRKCBP1 


Hs.75871 


294 


168 



5 Classification of ambiguous samples 

Application of the 7-gene classifier to the four samples showing ambiguity in the 
microsatellite analyses assigns all four to be microsatelllte stable tumor class. Nota- 
bly, all four showed expression levels of Tumor Growth Factor p induced protein 
(TFGBI), MLH1 and thymidylate synthase (TYMS) that are atypical for IVISI tumors. 
10 Furthermore, these tumors were all from the left colon. Thus the misclasslfied tu- 
mors are clearly truly MSS or they belong to a yet undefined class of MSI tumors. 

Stability of classification 

To estimate the stability of the classifier based on all 97 tumor samples, we gener- 
ated one hundred new classifiers based on randomly chosen datasets consisting of 

1 5 30 MSS and 25 MSI samples. In each case the classifiers were tested with the re- 
maining samples. The performance for each set was evaluated and averaged over 
all 100 training and test sets (Table 24). The mean error rate for MSS tumors was 
0.52% and 1.38% for MSI tumors. The seven genes defined above were found to be 
those genes that were most frequently used in the crossvalidation loop. More than 

20 50% of the errors were related to three tumors of which two were wrongly classified 
in all permutation and one in 94%. The remaining errors were mainly caused by four 
tumors with error rates of 40-47% showing that the former three samples are truly 
assigned contradictory to result from the microsatellite analysis and that four sam- 
ples could not be assigned with confidence too any of the classes. 

25 

Table 24 Performance of the classifier . 

Trainings set Test set 
Errors in crossvalidation Test errors 
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MSI 2.8% (n=25, range 0-6) 1 .4% (n=10, range 0-4) 

MSS 0.70% (n=30, range 0-3) 0.52% (n=29. range 0-2) 
AH 1 .7% (n=55. range 1-7) 1 .9% (n=39, range 0-5) 



Table 25 



Sensitivity, Specificity, and 


Predictive Value of Test 


for MSS 




based on the elaht gene Classifier 






Positive for MSS 


True = (0.9948*29)=28 


8492 False = 


= (0.138*10)= 1.38 


Negative for MSS 


False = (0.0052*29)= 


True = 


(0.962*10)= 9.62 




0.1508 - 






Sensitivity 


28.9507/29 


99.5% 




Specificity 


9,62/10 


96.2% 




Positive predictive value 


28.8492/30.2292 = 


95.4% 




Negative predictive value 


9.62/9.7708 = 


98.5% 





*Based on a prevalence for MSS of 85% 



5 

Survival classifier 

Using the same classification methods described above, we build classifiers for sur- 
vival based on either all samples or the above defined groups of MSI-H and MSS. 
As seen In figure 10 a distinction of patient with good prognosis (>5 year survival) 
10 from patient with bad prognosis (< 5 years survival) can be achieved with higher 
precision and using only a fraction of the genes by first separating into MSl-H and 
MSS groups. 

Construction of a classifier for sporadic versus hereditary microsateilite in- 
15 stable tumors 

In order to identify a gene set for identification of hereditary microsateilite instable 
tumors we applied 19 sporadic microsateilite instable samples and 18 microsateilite 
instable samples to supervised classification as described above. We found ten 
genes we high scored for separation of sporadic MSI-H from hereditary MSI-H tu- 

20 mours (Table 26). in crossvalidation we found a minimum number of one error using 
two genes (Fig 9A) and were used in at least 36 of the 37 crossvalidation loops. The 
genes were: the mismatch repair gene MLH1 that show a general downregulation in 
sporadic disease and PIWIL1 that is lower expressed in hereditary cases (Fig 9B). 
Using these two genes only one error occurred: a sporadic microsateilite instable 

25 was classified as hereditary. Based on T-test we performed 500 permutations to test 
the significance of these two genes for marker genes and found both genes highly 
significant with p-values < 0.005. 
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Table 26 



AFFYID 


SYMBOL 


LOCUSLI 
NK 


OMIM 


REFSEQ 


AFFYDESCRIPTION 


206194 at 


HOXC6 


3223 


142972 


NM 004503 


Homeo box C4 


214868 at 


PIWIL1 


9271 


605571 


NIvl Ull4fD4.Z 


niwi (urosopniia/-iii\t; i 


202520 s at 


MLH1 


4292 


120436 


NM 000249.2 


MutL (E. coli) homolog 1 
(colon cancer, nonpoly- 
posis type 2) 


202517 at 


CRMP1 


1400 


6024b^ 


fMlvl UUlolo.Z 


Coliapsin response media- 
tor protein 1 


205453 at 


HOXB2 


3212 


142967 


NM 002145.2 


Homeo box B2 (HOXB2) 


217791 s at 


PYCS/ADM 

18A1 


5832 


138250 


NM 002860.2 


Pyr rol i n e-5-carb oxylate 
synthetase (glutamate 
gamma-semialdehyde 

dyi lu iciaoc^ 

(/PYCS/ADH18A1) 


202393 s at 


TIEG 


7071 


601878 


NM 005655.1 


TGFB inducible early 
growth response (TIEG) 


218803 at 


CHFR 


• 55743 


605209 


NM 018223.1 


Checkpoint with forkhead 
and ring finger domains 
(CHFR) 


219877 at 


FLJ 13842 


79698 




NM 024645.1 


Hypothetical protein 
FU13842 (FLJ13842) 


202241 at 


C8FW 


10221 




NM 025195.2 


Phosphoprotein regulated 
by mitogenic pathways 
(C8FW) 



5 

Cross platform classification 

Real time PGR was applied both to verify the array data and examine If the 7-gene 
classifier would also perform on this platform. We chose 23 samples of which 18 
were also analyzed on arrays. The correlation between the two platforms was high 
10 (data not shown). In order to test the performance of classification using PGR data 
we re-build our classifier with a 79 samples array dataset including only those tu- 
mors that were not analyzed with PGR. Two samples were classified in discordance 
with the microsatellite instability test of which one of them was ambiguously classi- 
fied by the 7-gene array classifier. 

15 

Relation between microsatellite-instability status, stage and survival 

Based on the 7-gene classifier, classification of 36 patients with Dukes' B tumors 
receiving no adjuvant chemotherapy, 18 were classified as MSI tumors and 18 as 
MSS tumors. The overall survival was highly significantly related to the classification 
20 since all nine patients that died within five years of follow-up were belonged to the 
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MSS group (P=0.0014) (Fig. 10A). Thus, the 7-gene classifier clearly proved to be a 
strong predictor of survival In Dukes B and it can be used to select patients who 
need adjuvant chemotherapy, namely those classified as MSS. 

5 Among 65 patients with Dukes' C tumors receiving adjuvant chemotherapy, 17 were 
classified as MSI tumors and as 48 MSS tumors. Of these, 6 MSI and 27 MSS pa- 
tients died within five years of follow-up meaning no significant difference in overall 
survival between these groups (P=0.55) (Fig. 10B). A trend was that the MSI 
showed a poorer short-term survival. than the MSS. contrary to Dukes B patients. 
10 This difference can be attributed to the fact that a recent large study has. shown that 
chemotherapy only benefit the MSS tumor patients, thus improving their survival to a 
level comparable to that which is characteristic of MSI tumor patients. 

Clinical application of the discovery 

In the clinic the 1 06 or less genes described can be used for predicting outcome of 
15 colorectal cancer when examined at the RNA level and also on the protein level as 
each gene identified is the project is transcribed to RNA that is further translated into 
protein. The genes can also be used determine which patient should be treated with 
chemotherapy as only non-microsateliite instable tumors will respond to 5-FU based 
therapy. Building classifiers can achieve a further stratification of patient with god 
20 and bad prognosis after stratification Into microsatellite Instable and stable tumors. 
The genes used to identify hereditary disease can be used to decide which patient 
should enter into sequencing analysis of mismatch repair genes. 

The RNA determination can be made in any form using any method that will quantify 
25 RNA. The proteins can be measured with any method quantification method that 
can determine the level of proteins. 
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